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Open 



Bacteriophages are the most abundant biological life forms on Earth. However, relatively little is 
known regarding which bacteriophages infect and exploit which bacteria. A recent meta-analysis 
showed that empirically measured phage-bacteria infection networks are often significantly nested, 
on average, and not modular. A perfectly nested network is one in which phages can be ordered 
from specialist to generalist such that the host range of a given phage is a subset of the host range 
of the subsequent phage in the ordering. The same meta-analysis hypothesized that modularity, in 
which groups of phages specialize on distinct groups of hosts, should emerge at larger geographic 
and/or taxonomic scales. In this paper, we evaluate the largest known phage-bacteria interaction 
data set, representing the interaction of 215 phage types with 286 host types sampled from 
geographically separated sites in the Atlantic Ocean. We find that this interaction network is highly 
modular. In addition, some of the modules identified in this data set are nested or contain 
submodules, indicating the presence of multi-scale structure, as hypothesized in the earlier meta- 
analysis. We examine the role of geography in driving these patterns and find evidence that the host 
range of phages and the phage permissibility of bacteria is driven, in part, by geographic separation. 
We conclude by discussing approaches to disentangle the roles of ecology and evolution in driving 
complex patterns of interaction between phages and bacteria. 
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Introduction 

Bacteriophages can have a significant effect on 
microbial communities and ecosystems (Wilhelm 
and Suttle, 1999; Wommack and Colwell, 2000; 
Suttle, 2005, 2007; Brussaard et ah, 2008). Bacter- 
iophages are responsible for a significant fraction of 
bacterial mortality (Suttle and Chan, 1994; 
Weinbauer, 2004), engage in coevolutionary arms 
races with their hosts (Buckling and Rainey, 2002; 
Andersson and Banfield, 2008; Held and Whitaker, 
2009; Marston et al., 2012), and redirect organic 
material to the microbial loop via a process known 
as the viral shunt (Wilhelm and Suttle, 1999; 
Middelboe and Lyck, 2002; Jiao et al, 2010). A key 
event in all of these ecological functions is the 
interaction with and exploitation of a bacterium by a 
phage. It is widely hypothesized that phages can 
infect a very limited subset of bacteria in a given 
environment. However, given the high diversity of 
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bacteria in natural environments (Rusch et al., 2007; 
Quince et al., 2008), even infecting a limited subset 
can nonetheless represent a heterogeneous range of 
hosts. Indeed, there is a long record of evidence to 
suggest that phages commonly infect multiple 
distinct bacterial types in natural environments 
(for example, Wichels et al., 1998; Holmfeldt et al., 
2007), including examples where individual phages 
can infect hosts from distinct genera (for example, 
cyanophages infecting hosts from Prochlorococcus 
and Synechoccoccus (Sullivan et al., 2003)). 
Recently, we utilized a network-based approach in 
order to identify and characterize patterns within 
published data sets of infection and exploitation of 
bacteria by phages (Flores et al., 2011). 

The key interaction patterns examined in Flores 
et al., (2011) were nestedness (Rodriguez-Girones and 
Santamaria, 2006; Ulrich and Gotelli, 2007; Almeida- 
Neto et al, 2008; Ulrich et al, 2009) and modularity 
(Newman, 2006b; Barber, 2007). In the context of 
phage-bacteria interactions, nestedness indicates the 
extent to which the host ranges of phages are subsets 
of one another. In a maximally nested network, the 
most specialized phage could infect hosts most 
permissive to infection. Then, the next most specia- 
lized phage could infect the host most permissive to 
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infection as well as one additional host, and so on. 
Nestedness is thought to emerge in coevolutionary 
arms race dynamics in which hosts evolve resistance 
to current and past pathogens, while pathogens 
evolve counter resistance that enables them to infect 
past hosts (Agrawal and Lively, 2002), for example, as 
observed between the bacterium Pseudomonas fluor- 
escens SB25 and the DNA phage SBW25<1)2 (Buckling 
and Rainey, 2002). Similarly, modularity indicates 
the extent to which interactions, in this case an 
infection of a bacterium by a phage, can be partiti- 
oned into groups with many interactions within them 
and few interactions between them. These groups are 
referred to as modules. In a maximally modular 
network, there would be no cross-infections between 
phages of one module and hosts of another module. 
There are many possible drivers of modularity, 
including geographic isolation, which can facilitate 
the divergent coevolution of interacting species 
(Thompson, 1999; Gomez and Buckling, 2011). 

In our re-analysis of published studies, we found 
that infection networks tended to be nested and not 
modular (Flores et ah, 2011). However, we hypothe- 
sized that modularity should be expected when a 
greater diversity of bacteria and phages interact. The 
work described here follows up on our earlier study 
by analyzing a previously published cross-infection 
data set (Moebus and Nattkemper, 1981) not 
included in our earlier analysis. The Moebus and 
Nattkemper (1981) data set is the largest phage- 
bacteria infection network available in the literature 
(as far as we are aware), representing interactions 
between marine phages and bacteria in the Atlantic 
Ocean. The data set contains cross-infection and 
geographic information but no sequence informa- 
tion. As such, we focus our analysis on the 
following questions: (i) how do patterns of infection 
change at different scales, that is, when examining 
the entire network (large scale) vs subcomponents of 
the network (small scale); (ii) what role does 
geographic separation have in shaping cross-infec- 
tion? Despite the cosmopolitan nature of viruses 
(Breitbart et ah, 2004; Angly et al, 2006) (for an 
exception see (Desnues et al., 2008)), multiple lines 
of evidence suggest that phages are often better 
adapted to hosts from the same location than they 
are to hosts from a different location (Held and 
Whitaker, 2009; Vos et al., 2009; Gomez and 
Buckling, 2011; Koskella et al, 2011). Hence, by 
examining explicit cross-infections among many 
microbes isolated across a large geographic range, 
we hope to shed light on the structure of phage- 
bacteria infection networks. 



Materials and methods 

Data set 

We analyzed the cross-infection data set of Moebus 
and Nattkemper (1981). This data include phage and 
bacteria collected from February to April 1979 in the 



Atlantic Ocean between the European continental 
shelf and the Sargasso Sea (Moebus, 1980). Bacteria 
were cultured and isolated using seawater-based 
media and bacteriophages were enriched from the 
same water sample (Moebus, 1980). In the original 
analysis of cross-infection (Moebus and Nattkemper, 
1981), the authors describe cross-reaction tests 
among 733 bacteria and 258 phage strains collected 
at 48 stations separated, in some cases, some 200 
miles apart (Supplementary Figure Si). However, 
the authors do not report results from strains, which 
have both (i) identical infection patterns and (ii) that 
were isolated from the same station. The reported 
data set is included as a fold-out table in the main 
text (see Supplementary Figure S2). We digitized 
and automatically extracted the positive infection 
results and then manually curated the results, 
yielding a network of 286 bacteria strains and 215 
phage strains with 1332 positive infection outcomes 
out of a possible 61490 = 215*286 interactions (see 
Supplementary Text Si for more details). The 
interactions were classified in the original study as 
either (i) 'More or less clear spots due to lysis of 
bacteria'; (ii) 'More or less turbid spots'. We 
classified all interactions as either positive (either 
clear or turbid spots) or negative (neither clearing 
nor turbid spots). We refer to this data set as the MN 
(Moebus and Nattkemper) matrix. The resulting 
digitized data set is shown in Figure 1. 

Network analysis 

Disjoint components. An interaction network is 
considered bipartite when it contains two types of 
agents that interact, for example, bacteria and 
phages. Any bipartite network can be decomposed 
into disjoint components such that no cross-infec- 
tions are found between components. Formally, 
each disjoint component in a bipartite network of 
host-viral cross-infection is defined in terms of 
a set of hosts, H, and viruses V, such that: (i) 
there is no virus V outside of V that can infect any 
host in H; (ii) there is no host H' outside of H 
that can be infected by any virus in V; (iii) for each 
virus in V there is at least one host in H that it can 
infect. 

Modularity. We used the standard BRIM (Bipartite 
Recursively Induced Modules) algorithm (Barber, 
2007), which utilizes a local search heuristic to 
maximize a bipartite modularity value Q (see 
Supplementary Text S2 for more details). The value 
of Q represents how often a particular ordering of 
phages and bacteria into modules corresponds to 
interactions that are primarily inside a module 
(Q«l or modular), primarily outside of modules 
[Qx — 1 or antimodular) or somewhere in between 
(-1<Q<1). BRIM helps find the arrangement of 
phages and bacteria in modules that maximize Q. 
We used two different approaches of the BRIM 
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Phages (215 nodes) 

Figure 1 Digitized version of the MN matrix with 286 hosts (rows) and 215 phages (columns) in the same orientation as originally 
published (Moebus and Nattkemper, 1981). The 1332 black cells represent positive interactions between hosts and phages (see Materials 
and methods). The connectance of the network (interactions/total size) is approximately 0.022*1332/61490. 



algorithm depending on the size of the matrix. For 
the entire matrix, we extended the BRIM algorithm 
to first partition the network into different isolated 
modules and then subsequently recursively subdi- 
vide the network as has been done in the case of 



unipartite networks (Newman, 2006a, b), that is, 
networks with only one type of node. Our approach 
(described in Supplementary Text S2) yields higher 
values of Q than both BRIM and LP-BRIM (Liu and 
Murata, 2009). Within each module, we used the 
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adaptive heuristic of the BRIM algorithm (Barber, 
2007), which has been verified to perform well in 
small matrices (Liu and Murata, 2009). 

Nestedness. We utilized two algorithms to mea- 
sure the extent to which hosts and phage interac- 
tions have a nested pattern. 

Nestedness temperature calculator. The nested- 
ness temperature calculator (NTC) algorithm was 
originally developed by (Atmar and Patterson, 1993) 
and has been reviewed elsewhere (Rodriguez- 
Girones and Santamaria 2006). In the present 
context, the 'temperature', T, of an interaction matrix 
is estimated by resorting the row order of hosts and 
the column order of phages such that as many of the 
interactions occur in the upper left portion of the 
matrix. In doing so, the value of T quantifies 
the extent to which interactions only take place in 
the upper left (TaO), or are equally distributed 
between the upper left and the lower right [Tx 100). 
Perfectly nested interaction matrices can be resorted 
to lie exclusively in the upper left portion and hence 
have a temperature of 0. The value of temperature 
depends on the size, connectance and structure of 
the network. Because the temperature value quanti- 
fies departures from perfect nestedness, we define 
the nestedness, N^tc, of a matrix to range from 0 to 1, 
Nntc^ (100 - T)/100, such that Np^Tc= 1 when T= 0 
(perfect nested pattern) and Njvrc=0 when T^lOO 
(chessboard pattern). 

Nestedness metric based on overlap and decreasing 
filling. NODF is a nestedness metric introduced by 
Almeida-Neto et al. (2008). NODF is independent of 
row and column order. This algorithm measures the 
nestedness across hosts by assigning a value Mj^ to 
each pair i, j of hosts (rows) in the interaction 
matrix, which is defined as: 



■'"ij 



iiki = kj 
otherwise 



(1) 



where ki and k^ are the degree of hosts i and j 
respectively, and is the number of common 
interactions between them. 'Degree' is a standard 
network science term that is defined as the number 
of interactions that a given type has (Newman, 
2010). For example, in this context, the degree of a 
host is the number of viruses that can infect it and 
the degree of a virus is the number of hosts it can 
infect. The same method is used to calculate 
nestedness across phages, such that the total 
nestedness value is: 



l^NODF = 



H(H-l) I P(P-l) 



(2) 



H 



The meaning of nestedness as calculated by NODF 
is that higher values denote matrices whose (i) pairs 
of rows are typically subsets of each other, that is, 
host pairs share some, but not all, viruses that can 



infect them; (ii) pairs of columns are typically 
subsets of each other, that is, viral pairs share some, 
but not all, hosts that they can infect. 

Null models. We utilized two null models in order 
to measure the statistical significance of modularity 
and nestedness. The first is a Bernoulli random null 
model in which the null matrix has the same total 
number of interactions as the original matrix, albeit 
randomly positioned. The second is a probabilistic 
degree null model in which each interaction 
between host i and phage j in the null matrix is 
assigned with a probability according to: 



2 [P 



H 



(3) 



where the degree ki is the number of phages that 
infect host i, the degree dj is the number of hosts 
infected by phage j, P is the number of phages and H 
is the number of hosts. In all cases, we utilize 
100 000 random matrices to evaluate the statistical 
significance of modularity and nestedness. Finally, 
given the two null models, we evaluate modularity 
using two significant tests, and we evaluate nested- 
ness using four significance tests (two each for the 
NTC and NODF). 



Multi-scale analysis 

Nestedness metrics may overestimate the statistical 
significance of nestedness, particularly when the 
fraction of realized interactions of a network 
becomes either very large or very small, for example, 
Fischer and Lindenmayer (2002). In addition, in 
cases where a network is comprised of nested 
modules, we expect that some nestedness measures 
will spuriously identify the entire network as nested 
(see for example. Figure 7 of Flores et al. (2011)). We 
developed two approaches to characterize nested- 
ness given a large, sparsely connected network. 
These two approaches are consistent with recent 
calls to take a local, rather than a strictly global, 
approach to identifying community structure 
(Fortunate and Barthelemy, 2007). First, in the case 
of nestedness as calculated using NTC, we identify 
modules in the original matrix, and then constrain 
the row/column re-ordering so that rows and 
columns cannot break the modular structure. Hence, 
we still sort the rows and columns, but only inside 
modules. In addition, we permit random permuta- 
tions of the modular blocks along the main matrix 
diagonal and select the configuration that minimizes 
temperature (maximizes nestedness). Second, in the 
case of nestedness as calculated using NODF, we 
again identified modules and then restricted the 
comparisons of overlap to rows and columns across 
modules. In this way, we can evaluate the overall 
nestedness of the original matrix without consider- 
ing the nestedness contribution that comes from 
inside of modules. More details are found in 
Supplementary Text S3. 
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Geographic analysis 

Modules identified in our network analysis include 
hosts and phages collected at potentially different 
sample sites. The sample site of each phage and host 
corresponds to different 'stations' in the Atlantic 
Ocean. We estimated the geographic diversity of 
stations within a given module using Shannon (HJ 
and Simpson indices (DJ (Shannon, 1948; Simpson, 
1949) where the subscript k denotes the module 
number. Both indices measure the variability in the 
stations of isolation of phages and hosts within a 
given module. In addition, both indices were 
applied to hosts and phages separately. The diver- 
sity indices of a given module are: 

H^^-t^log^, ^k=l-i:|t^ (4) 

i=l i=l 

where N are the number of different strains inside 
the module, R are the number of stations inside the 
module, and are the number of strains from 
station i. Low values in both indices indicate low 
geographical diversity. We determined the signifi- 
cance of a measured diversity value by comparing 
observations with an ensemble of randomized 
matrix assignations of station labels to modules 
(see Supplementary Text S4 for details). 

Results 

Characteristics of a large-scale phage -bacteria infection 
network 

The network properties of the MN phage-host 
infection data set are shown in Table 1. We find 
that only a small percentage of the cross-infections 
yield a positive result (2.17% = 1332/61490), in 
contrast to a previous meta-analysis where many 
cross-infections yielded positive results (36.6% 
= 4365/11944) (Flores et al, 2011). However, in 
agreement with the prior meta-analysis we find that 



Table 1 General properties of the curated phage-bacteria 
interaction network 



General properties 


Definition 


Value 


Nc 


Number of components 


38 


H 


Number of hosts 


286 


P 


Number of phages 


215 


I 


Number of interactions 


1332 


S = H + P 


Number of species 


501 


M = HP 


Size 


61490 


C = I/M 


Connectance or fill 


0.0217 


Host interactions 






LH = I/H 


Mean host degree 


4.6573 


Max(ii) 


Max host degree 


20 


Min(ici) 


Min host degree 


1 


Phage interactions 






LP = I/P 


Mean phage degree 


6.1953 


Max(di) 


Max phage degree 


31 


Min(di) 


Min phage degree 


1 



phages can infect multiple hosts (average of 6.20, 
median of 4 in the present study, average of 8.75, 
median of 6 in the prior meta-analysis). Similarly, 
we find that hosts are infected by multiple phages 
(average of 4.66, median of 3 in the present study, 
average of 4.34, median of 3 in the prior meta- 
analysis). These averages and medians were calcu- 
lated over all strains in the current study and by 
aggregating strains from the prior analysis. Impor- 
tantly, the degree distribution of this network is not 
unimodal, that is, it does not have a single peak. 
Instead, we find long-tailed distributions of the 
number of hosts that a phage can infect, and 
similarly, the number of phages that can infect a 
host (see Supplementary Figure S3). Hence, there 
exists a spectrum of viral types spanning specialists to 
generalists; we find there are many more specialists 
than generalist viral types in this study. Similarly, 
hosts can span a spectrum of types from permissive to 
resistant types; we find there are many more resistant 
types than permissive types in this study. 



Evaluating modularity at the whole-network scale 
The MN matrix is comprised of 38 disjoint compo- 
nents, that is, sets of phages and bacteria, which have 
cross-infections within a component but no cross- 
infections between components (see Figure 2). Given 
the finding of disjoint components, we expect that 
the MN matrix is significantly modular. We confirm 
this via a modularity analysis using the BRIM 
algorithm in which we identify 49 separate modules 
(see Supplementary Table S2). The 49 modules 
include the subdivision of some of the 38 disjoint 
components as identified in the BRIM analysis such 
that the overall modularity value Q is increased. 
These results enable in-depth resolution of the 
specialization within the system, in contrast to the 
conclusion by Moebus and Nattkemper (1981) via 
visual inspection that 'two large groups of bacter- 
iophage-host systems were encountered' and '8 small 
ones were found'. Figure 3 shows the modularity 
sorting of the MN matrix resulting from the BRIM 
algorithm, in which rows and columns inside 
modules were sorted in order to highlight the 
possible nested structure within modules. Remark- 
ably, 1219/1332 = 91.52% of the interactions occurs 
within modules rather than between modules. The 
calculated modularity of the MN matrix (Q= 0.7950) 
is larger than any of the 10^ realizations in either null 
model (P<10"^, which is a conservative upper 
bound). As a point of reference, the highest value of 
any of the random matrices was Q= 0.4503. The 
Z-score, representing the relative number of standard 
deviations the actual modularity is larger than the 
mean of the random ensemble, as calculated for 
modularity was 87.55 using the Bernoulli null model 
and 51.02 using the probabilistic degree null model. 
It is important to note that although most interactions 
occur within a module, these modules include 
phages and bacteria from multiple stations. Hence, 
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Figure 2 Network representation of the study. We observe 38 isolated components. Black nodes represent phages, and white nodes 
represent hosts. The station IDs of each host and phage are contained in the center of each node. 



we find that 76% (~ 1012/1332) of infections trans- 
cend the site of isolation (see Supplementary File 1 
and subsequent section on geographic analysis). 



Evaluating nestedness at the whole-network scale 
We evaluated the nestedness of the MN matrix 
using a combination of algorithms and null models. 



First, we resorted the row and columns in order of 
increasing degree, a heuristic that tends to maximize 
nestedness using the temperature calculator. 
Visually, it would seem that the MN matrix is not 
nested (see Figure 3 and Supplementary Figure S4). 
We showed in a previous study that a community of 
nested modules can lead to apparent nestedness at 
the whole-matrix scale (Flores et ah, 2011). Indeed, 
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Figure 3 Modularity sorting of the network. We detect 49 modules (shaded rectangles). The 15 largest modules discussed in the main 
document begin at the left of the matrix. Black symbols represent those interactions within a module. Gray symbols represent those 
occurring between modules. The P-value for the observed modularity is smaller than 10"''. 



for the four nestedness tests (two null models and 
two algorithms) we find that the MN matrix is 
apparently significantly nested in all cases except 
for the NODF algorithm using the probabilistic 
interaction null model. We argue that the apparent 
finding of nestedness is driven by the fact that the 



matrix contains nested modules, rather than a 
nested arrangement of hosts and phages spanning 
the entire matrix. We applied a multi-scale network 
analysis to evaluate this hypothesis (see Materials 
and methods and Supplementary Text S3). The 
results of the conventional and multi-scale 
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Table 2 Significance of the nestedness of the MN matrix using alternative algorithms 
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NTC algorithm 



Bernoulli 



Probabilistic degree 



NODF algorithm 



Bernoulli 



Probabilistic degree 



Normal analysis 
Multi-scale analysis 



0.9541 
0.9359 
0.9263 
0.8568 



P<le-5 
P<le-5 
P<le-5 
P=l 



P<le-5 
P=l 
P=l 
P=l 



0.0341 
0.0062 



P<le-5 
P=l 



P=0.2336 
P=l 



Abbreviations: MN matrix, Moebus and Nattkemper matrix; NODF, nestedness metric based on overlap and decreasing filling; NTC, nestedness 
temperature calculator; The P-value denotes the fraction of random matrices that have a larger value of nestedness, N, than the observed MN 
matrix. In the 'normal' analysis, the NTC algorithm and NODF algorithms are used to estimate nestedness using alternative null models (see 
Materials and methods). For the multi-scale analysis three values have been reported for analyzing the significance of nestedness using the NTC 
algorithm: (1) Modules are sorted according to the sort heuristic described in Supplementary Text S3; (2) Modules are sorted in descending order 
of the number of phages; (3) Modules are sorted in ascending order of the number of phages. See Supplementary Figure S6 for the details of 
sorting. Note that the values of nestedness can differ depending on the algorithm used, it is their relative value to the null model that determines 
significance. 



nestedness analysis are summarized in Table 2. The 
multi-scale analysis enables us to reject the finding 
of nestedness for both algorithms when using the 
probabilistic degree null model. Nestedness can also 
be rejected even in the case of the Bernoulli null 
model for NODF and for one of the multi-scale 
analysis methods using NTC. 



Network analysis at the intra-module scale 
We performed a network analysis of the 15 largest 
modules extracted from the modularity sort (see 
Table 3 for summary statistics and Supplementary 
Table S2 for information on all 49 modules). Figures 
4 and 5 show the modularity and nestedness sorting, 
respectively. We detected that 9/15 modules are 
statistically modular in at least one of the two null 
models, whereas 5/15 are modular using both of the 
null models. In addition, we find that 8/15 of the 
modules are statistically nested in at least one 
combination of NTC/NODF vs Bernoulli/Probabil- 
istic degree null models. The fact that 8 of 15 
modules are statistically nested in at least one case 
is an indication that nestedness is present at smaller 
scales. This supports the hypothesis that modularity 
may be characteristic at large scales (the scale of 
the entire network), whereas nestedness may be 
observed at small scales (at the scale of an 
individual module) (Flores ef al., 2011). However, 
here we note that small-scale structure includes 
nestedness and modularity. 



Geographical diversity of interactions 
We find that, on average, there is less geographic 
diversity in each of the largest 15 modules identified 
in Figure 3 than would be expected by chance. The 
result of the geographic diversity test is shown in 
Figure 6. Specifically for phages, 11 of 15 modules 
exhibit statistically significant lower diversity than is 
expected by chance using Simpson diversity, and 12 
of 15 modules are found to be statistically significant 
when using Shannon diversity (see Supplementary 



Table 3 Network properties of the largest 15 modules identified 
using the modularity analysis (see Table 1 for definitions of all 
quantities) 



No. 


H 


P 


S 


/ 


M 


C 


Lp 


Lh 


1 


42 


23 


269 


65 


966 


0.28 


6.40 


11.70 


2 


39 


12 


138 


51 


468 


0.29 


3.54 


11.50 


3 


31 


31 


233 


62 


961 


0.24 


7.52 


7.52 


4 


23 


13 


61 


36 


299 


0.20 


2.65 


4.69 


5 


16 


20 


114 


36 


320 


0.36 


7.13 


5.70 


6 


15 


5 


30 


20 


75 


0.40 


2.00 


6.00 


7 


12 


7 


27 


19 


84 


0.32 


2.25 


3.86 


8 


11 


8 


52 


19 


88 


0.59 


4.73 


6.50 


9 


8 


6 


38 


14 


48 


0.79 


4.75 


6.33 


10 


8 


11 


57 


19 


88 


0.65 


7.13 


5.18 


11 


7 


5 


15 


12 


35 


0.43 


2.14 


3.00 


12 


7 


7 


17 


14 


49 


0.35 


2.43 


2.43 


13 


7 


9 


49 


16 


63 


0.78 


7.00 


5.44 


14 


6 


7 


21 


13 


42 


0.50 


3.50 


3.00 


15 


6 


6 


27 


12 


36 


0.75 


4.50 


4.50 


Mean 


15.87 


11.33 


76.53 


27.20 


241.47 


0.46 


4.51 


5.82 


Median 


11 


8 


49 


19 


84 


0.40 


4.50 


5.44 



Figure S7 and Supplementary Table S3). Moreover, 
the two largest modules have lower geographic 
diversity of phages than average, but not significantly 
lower than might be expected by chance. Similar 
results hold for hosts, where 10 of 15 modules exhibit 
statistical significant lower diversity using Simpson 
and 11 of 15 using Shannon diversity (again see 
Supplementary Figure S7). These results imply that 
strains within modules are overrepresented by 
phages and hosts that belong to the same subset of 
stations. However, it is important to point out that 
this data set includes many positive infections (1012 
of 1332) of hosts by phages that were not isolated 
from the same sample site. 

To what extent are the interactions between 
phages and hosts at a given site more likely to occur 
than those between sites? First, we find that the 
probability of a phage infecting and exploiting a host 
from a different station is lower (0.017) than it is of 
infecting and exploiting a host from the same station 
(0.17). This is a 10-fold effect in geographic 
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Figure 4 Modular sort of the internal structure of the 15 largest modules, in the same order as they appear in Figure 3. The significance 
of modularity is denoted as follows: A/a = statistically modular/antimodular using Bernoulli null model, B/b = statistically modular/ 
antimodular using probabilistic degree null model. X = no significant modular or antimodular. 
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Figure 5 Nestedness sort of the 15 largest modules. The gray line represents the isocline of the NTC algorithm. A/B = statistically nested 
using NTC and Bernoulli/probabilistic degree null model, C/D = statistically nested using NODF and Bernoulli/ probabilistic degree null 
model. X = no significance was found. 



isolation. We caution that the isolation procedures 
for phages are heavily biased toward obtaining this 
effect as phages were isolated from hosts at a given 
station. As one means to control for this effect, we 
reduced the number of internal station interactions 



by the total number of viruses and re-perform this 
analysis. In doing so, we find a revised probability of 
0.061 within modules, which is a 3.6-fold increase 
when compared with interactions between modules. 
Finally, in Supplementary Figure S8, we show that 
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Figure 6 Geographical representation of the 15 largest modules. Each module is considered in a separate panel. Large filled circles 
represent the stations included in the corresponding module; open circles represent the stations not included in the corresponding 
module. Red and green small circles representing phages and bacteria, respectively, were randomly placed around their corresponding 
station for improved visibility. A gray line between a red and green circle denotes an interaction between a virus and bacteria. 
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the fraction of shared interactions for both hosts and 
phages is larger within stations than it is between 
stations. Altogether these results show geographic 
location, whether at a given site or among a subset of 
sites, have an important role in driving infection 
patterns. 



Discussion 

We performed the first multi-scale analysis of a 
phage-bacteria infection network, comprised of 286 
bacteria and 215 phages isolated from the Atlantic 
Ocean. First, we found that bacteria and viruses were 
highly variable in their interactions, corresponding 
to a spectrum of generalist and specialist viruses 



as well as hard-to-infect to permissive bacteria 
(Supplementary Figure S3). Second, we found that 
the infection network was modular at a large scale 
and had multi-scale structure such that modules 
were themselves nested and/or had further modular 
organization. Network studies have suggested that 
modularity can be topological, for example, func- 
tional modularity as found in protein-protein inter- 
action networks (Rives and Galitski, 2003) or 
transcriptional regulatory networks (Ihmels et al., 
2002). Here, a geographic diversity analysis revealed 
that the modular signal observed was driven, in part, 
by geographic isolation. However, it is important to 
point out that cross-infections that transcend site of 
isolation were common, indeed approximately 76% 
of observed interactions occurred between a phage 
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and a bacterium isolated at different sites. We discuss 
the relevance and implications of each of these 
results below. 

The observation has been made on multiple 
occasions that the number of hosts a virus can infect 
can vary substantially, (for example, Moebus and 
Nattkemper, 1981; Wichels et ah, 1998; Comeau 
et ai, 2006; Holmfeldt et ah, 2007; Middelboe et al, 
2009). Variability in the host range of phages is 
consistent with the notion that phages have evolved 
evolutionary strategies ranging from specialists to 
generalists. Similarly, variability in the number of 
viruses that can infect a given host is consistent with 
the notion that hosts have evolved evolutionary 
strategies ranging from well defended to permissive. 
It is thought that the relative ecological success of 
such strategies depends on environmental condi- 
tions, for example, bacterial defense specialists 
may be favored when resources are abundant and 
competition strategists may be favored when 
resources are limited (Winter et al., 2010). However, 
such conclusions are often based on models of 
interaction dynamics, such as Kill-the-Winner 
(Thingstad and Lignell, 1997; Thingstad, 2000), that 
do not include significant cross-infection. Combin- 
ing cross-infection networks into dynamic models 
could help develop predictions relating infection 
structure to community composition (Weitz and 
Wilhelm, 2012). 

Although we identified generalist viruses, the 
most generalist virus could infect 31 of the 286 total 
hosts in the network, suggesting that nestedness at 
the whole-network scale is unlikely. Indeed, the MN 
matrix is comprised of disjoint components 
(Figure 2) of which some of these components 
exhibit additional modular structure within a 
component (Figure 3). These modules may them- 
selves have further modularity and/or nestedness 
(Figures 4 and 5). This is the first instance, of which 
we are aware, of detection of such multi-scale 
structure in microbial interaction networks. This 
result can be interpreted in a number of ways. First, 
the finding of modules within modules suggests 
multiple levels of specialization that may be present 
in the community. Second, the finding of nestedness 
and modularity are not exclusive. In our prior study 
(Flores et al., 2011), we found nearly perfectly 
nested networks that appear 'modular' using the 
standard BRIM metric (Barber, 2007). This warrants 
separate examination to develop metrics that can 
disentangle these two network properties. We 
developed one such approach here, by suggesting 
that estimates of nestedness could be performed 
under modular constraints, and in so doing find that 
modularity at the scale of the entire MN network 
and observe nestedness at a local scale (that is, 
within modules). 

What is the biological basis for modules? Given 
the data available, we evaluate the role of geography 
in structuring infection. Moebus and Nattkemper 
(1981) hypothesized, based on visual inspection. 



that geographic location drove part of the interaction 
signal. Recent work has suggested that viruses are 
more likely to infect hosts from the same site than 
they are hosts isolated at different sites (Vos et al., 
2009; Gomez and Buckling, 2011; Koskella et al., 

2011) . We found a similar result, in that viruses were 
at least three times more likely to infect a host 
isolated from the same location than a host isolated 
from a different location, even after accounting for 
isolation bias. However, infection across sample 
sites was observed frequently, and modules typi- 
cally contained hosts and phages from multiple 
sample sites. Using a geographic diversity method, 
we found that modules tend to have phages and 
hosts from a much smaller number of sample sites 
than would be expected by chance. Hence our study 
is consistent with recent calls for greater attention to 
spatial structure to viral biogeography (Desnues 
et al., 2008; Held and Whitaker, 2009). One inter- 
pretation of our results is that interactions between 
phages and host may be endemic despite a con- 
sensus that viruses are usually cosmopolitan, that is, 
they can be observed across a broad range of 
locations (Breitbart et al, 2004; Angly et al., 2006). 
This may be the case because geographically 
separated sites are comprised of relatively distinct 
microbes (for example, microbes differ at the genus 
level or higher) so that isolated viruses are unlikely 
to infect the taxa of microbes across sites. Or, it may 
be that geographically separated sites have relatively 
similar microbial isolates (for example, commu- 
nities are dominated by culturable microbes related 
at the species level or lower) but that their 
geographic separation facilitated local coevolution 
to take place, which enabled divergences in func- 
tional interactions (Held and Whitaker, 2009; 
Paterson et al., 2010; Breitbart, 2012). 

The finding of multi-scale structure also suggests 
that different processes may drive the emergence of 
functional interactions at different scales. For exam- 
ple, in the gene-for-gene model of coevolutionary 
adaptation (Agrawal and Lively, 2002), hosts and 
phages accumulate differences in defense and coun- 
ter defense that are consistent with the emergence of 
nestedness. However, innovations by hosts may also 
have an important, albeit less frequent, role in 
permitting hosts to escape from phage infection and 
selective pressure. Similarly, innovations by phages 
may also permit them to re-establish access to a 
host population (Meyer et al., 2012). A number of 
evolutionary models of phages and hosts have 
proposed mechanisms by which coevolutionary 
dynamics unfold (Thingstad, 2000; Weitz et al., 
2005; Rodriguez-Valera et al, 2009; Ghilds et al, 

2012) . We suggest that examining resultant phage- 
bacteria interaction networks will be an important 
means to quantify functional complexity in natural 
systems and to identify signatures that could dis- 
criminate between alternative coevolutionary models. 

Ecological patterns depend on the scale of inquiry 
(Levin, 1992). In the case of phage-bacteria infection 



The ISME Journal 



Cross-infection witliin marine bacteria and pliages 

CO Flores ef 3/ 



networks, relevant scales may be taxonomic, envir- 
onmental and/or geographic. Hence, measurements 
of interaction networks coupled with information on 
geography, taxa and environmental conditions (for 
example, Poisot et ah, 2011) could help disentangle 
the relative importance of drivers of microbial 
interactions, in much the same way that biogeo- 
graphic studies are beginning to quantify the relative 
importance of drivers of microbial species distribu- 
tions [Martiny et ah, 2006). Of coiu-se, in doing so, 
new methods to measure cross-infection will be 
needed. First, our discussion of phage-host interac- 
tions in this paper has largely focused on the 
antagonistic mode. However, the MN matrix includes 
turbid plaques, which could be interpreted as 
indicative of infection by temperate phages. Follow- 
up studies on the differences and similarities 
between virulent vs temperate phages in natural 
environments are worthwhile. Second, it was 
recently noted that 'the true host range for most 
marine phages is completely uncharacterized' 
(Breitbart, 2012). Previously published cross-infec- 
tion assays, including the MN matrix examined here, 
use traditional spot-assay or plaque-assay based 
methods for assessing interactions between cultured 
bacteria and phages. In moving forward, we suggest 
that methods to evaluate the functional interaction 
between hosts and phages that do not rely on 
cultured isolates (Tadmor et al., 2011; Deng et ah, 
2012) will represent an important step to assessing 
the general structure of interactions in natural 
communities. We hope that the network approach 
developed here will be of use in such an effort. 
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